Foundations of Sampling

Randomization

Elizabeth King
Kevin Middleton

Sampling from data sets: foundations

  • Jackknife
  • Bootstrap
  • Randomization

What is randomization?

  • A technique for generating an empirical null distribution by repeatedly putting the data in a random order
  • Primarily for hypothesis testing
    • Wide applicability
    • Few assumptions
    • May not generalize to broader set of populations

Assumption: Under the null hypothesis, observations are random draws from a common population

The General Procedure

  1. Decide on a test statistic
  2. Calculate the test statistic for the observed data
  3. Randomly shuffle the observations
  4. Calculate the test statistic for that group
  5. Repeat many times to generate an empirical null distribution
  6. Determine the proportion of random combinations resulting in a test statistic more extreme than the observed value (“empirical P”)

Differences in jackal mandibles

Hierarchy of GLMs

  • Randomization can be applied to any of these and beyond

Example 1: One-way ANOVA

  • Does bright light treatment alleviate jet lag symptoms?
  • 3 groups
    • No light (control)
    • Bright light in eyes
    • Bright light in knees
  • Outcome
    • Shift in circadian pattern (hours)

Does bright light treatment alleviate jet lag symptoms?

Decide on a Test Statistic

Decide on a Test Statistic

  • Overall effect: F-statistic
  • Differences between groups: differences in means
d_EK <- JL |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
   JL |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)

d_EC <- JL |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
   JL |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m)

d_CK <- JL |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m) -
   JL |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)

fm_lm <- anova(lm(Shift ~ Treatment, data = JL))
obsF <- fm_lm$`F value`[1]

obs <- tibble(Fstat = obsF, d_EK = d_EK, 
              d_EC = d_EC, d_CK = d_CK)

obs
# A tibble: 1 × 4
  Fstat  d_EK  d_EC   d_CK
  <dbl> <dbl> <dbl>  <dbl>
1  7.29 -1.22 -1.24 0.0270

Randomly shuffle the observations

Rows: 22
Columns: 2
$ Treatment <fct> control, control, control, control, control, control, contro…
$ Shift     <dbl> 0.53, 0.36, 0.20, -0.37, -0.60, -0.64, -0.68, -1.27, 0.73, 0…

Randomly shuffle the observations

Rows: 22
Columns: 2
$ Treatment <fct> control, control, control, control, control, control, contro…
$ Shift     <dbl> 0.53, 0.36, 0.20, -0.37, -0.60, -0.64, -0.68, -1.27, 0.73, 0…
  • Shuffle Shift to randomly associate each value with one of our three treatment categories
  • Do this 1,000 times and calculate our 4 observed statistics

Repeat many times to generate an empirical null distribution

set.seed(6445287)
niter <- 1000
rand.out <- tibble(Fstat = rep(NA,niter), d_EK = rep(NA,niter),
              d_EC = rep(NA,niter), d_CK = rep(NA,niter)) 
rand.out[1,] <- obs

for(ii in 2:niter) {
  JL.s <- JL
  JL.s$Shift <- sample(JL$Shift)
  d_EK <- JL.s |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
    JL.s |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)
  
  d_EC <- JL.s |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
    JL.s |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m)
  
  d_CK <- JL.s |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m) -
    JL.s |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)
  
  fm_lm <- anova(lm(Shift ~ Treatment, data = JL.s))
  obsF <- fm_lm$`F value`[1]
  rand.out[ii,] <- tibble(Fstat = obsF, d_EK = d_EK, 
              d_EC = d_EC, d_CK = d_CK)

}
1
Set up a tibble for the output
2
Add our observed data to the first row
3
Shuffle shift

Repeat many times to generate an empirical null distribution

Repeat many times to generate an empirical null distribution

Empirical P-Value

  • Determine the proportion of random combinations resulting in a test statistic more extreme than the observed value (“empirical P”)1

\[P_e = (r + 1)/(n + 1)\] \(r\) is the number of estimates equal to or more extreme than the observed

\(n\) is the number of randomizations

Empirical P-Value

  • If your observed data is part of your set, you can calculate directly
sum(rand.out$Fstat >= rand.out$Fstat[1])/nrow(rand.out)

apply(rand.out, 2, function(x) sum(abs(x) >= abs(x[1]))/length(x))
1
For a 2-sided test, use the absolute value of the difference
[1] 0.004
Fstat  d_EK  d_EC  d_CK 
0.004 0.005 0.006 0.949 
  • Stay tuned for using randomization to account for multiple testing

Example 2: Linear Regression

  • Movement rates and heart rates of black bears in Minnesota (from the Data4Ecologists package)
Rows: 2,768
Columns: 2
$ log.move <dbl> 2.9391619, 0.5247285, -0.5960205, -0.3754210, -0.6386590, 0.5…
$ hr       <int> 60, 54, 34, 37, 37, 41, 44, 55, 58, 51, 47, 42, 43, 51, 48, 5…

Does movement predict heart rate?

Decide on a test statistic

mod <- lm(hr ~ log.move, data = bearmove)
summary(mod)

Call:
lm(formula = hr ~ log.move, data = bearmove)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.127 -11.182   0.196  10.808  61.588 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  58.5514     0.6284   93.17   <2e-16 ***
log.move      2.7134     0.1463   18.55   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.68 on 2766 degrees of freedom
Multiple R-squared:  0.1106,    Adjusted R-squared:  0.1103 
F-statistic: 344.1 on 1 and 2766 DF,  p-value: < 2.2e-16
t_obs <- summary(mod)$coefficients[2,3]
t_obs
[1] 18.54867
  • The slope estimate
  • t statistic for log.move

Randomly shuffle to generate an empirical null distribution

set.seed(746133)
niter <- 1000
rand.out <- tibble(Tstat = rep(NA,niter)) 
rand.out[1,] <- t_obs

for (ii in 2:niter) {
  bear.s <- bearmove[,c("log.move","hr")]
  bear.s$log.move <- sample(bear.s$log.move)
  mod <- lm(hr ~ log.move, data = bear.s)
  rand.out[ii,] <- summary(mod)$coefficients[2,3]
}

Repeat many times to generate an empirical null distribution

sum(abs(rand.out$Tstat) >= abs(rand.out$Tstat[1]))/nrow(rand.out)
[1] 0.001

Example 3: Paired t-test

  • Ovary sizes for pairs of sisters reared on two food types
Rows: 40
Columns: 2
$ TypeA <dbl> 196.7824, 191.7938, 205.7433, 201.5731, 210.2526, 175.1925, 186.…
$ TypeB <dbl> 207.3581, 204.8712, 193.9763, 200.8164, 212.5679, 184.3298, 209.…

Decide on a test statistic

  • t statistic
  • average of the differences between pairs

    Paired t-test

data:  dd$TypeA and dd$TypeB
t = -0.62243, df = 39, p-value = 0.5373
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -5.411935  2.864956
sample estimates:
mean difference 
       -1.27349 
[1] -1.27349

Randomly shuffle the observations

Randomly shuffle the observations

  • Keep pairs together
  • For each pair, randomly assign type A or type B label
  • Get difference
dd_diffs <- tibble(AB = dd$TypeA - dd$TypeB,
                   BA = dd$TypeB - dd$TypeA)

dd.s <- apply(dd_diffs, 1, function(x) sample(x,1))

dd_diffs[1:5,]
# A tibble: 5 × 2
       AB      BA
    <dbl>   <dbl>
1 -10.6    10.6  
2 -13.1    13.1  
3  11.8   -11.8  
4   0.757  -0.757
5  -2.32    2.32 
dd.s[1:5]
[1] -10.5757403  13.0774640 -11.7670480   0.7567086   2.3153078

Randomly shuffle the observations

dd_diffs <- tibble(AB = dd$TypeA - dd$TypeB,
                   BA = dd$TypeB - dd$TypeA)

set.seed(765667)
niter <- 1000
rand.out <- tibble(Diffs = rep(NA,niter)) 
rand.out[1,] <- obs_D

for(ii in 1:niter) {
  dd.s <- apply(dd_diffs, 1, function(x) sample(x,1))
  rand.out[ii,] <- mean(dd.s)
}

Repeat many times to generate an empirical null distribution

sum(abs(rand.out$Diffs) >= abs(rand.out$Diffs[1]))/nrow(rand.out)
[1] 0.332

General Considerations

  • Appropriate test statistic
  • How to randomize the order?
  • Number of permutations
  • Limits in small samples or few groups
    • Very small sets: do all possble permutations

Decide on a test statistic

  1. Mean difference
  2. t: t-test, linear model parameter estimate (slope, intercept)
  3. F: ANOVA-like
  4. \(\chi^2\)
  5. Any metric of your choice (P-value, Fst, heterozygosity, LOD score, etc.)

How to randomize the order?

  • What is the null hypothesis?
  • Are there relationships in the data that need to be maintained?

Empirical P & Iterations

What is the minimal detectable P for n iterations?

nreps min_P
2 18 0.1111111
4 63 0.0317460
6 219 0.0091324
8 756 0.0026455
10 2604 0.0007680
12 8966 0.0002231
14 30865 0.0000648
16 106246 0.0000188
18 365727 0.0000055
20 1258925 0.0000016

Empirical P & Iterations

  • Randomization is for comparing competing hypotheses
  • For empirical P-values with few cases more extreme than the observed, do more iterations if you want an exact value
  • For empirical P-values near your critical value, do more iterations to increase confidence in your conclusion
  • In general, treat empirical P-values as measures of the strength of evidence (not all or nothing)

Non-parametric tests

Non-parametric tests often used when data do not meet the assumptions of a traditional (parametric) test:

  • One-sample t-test \(\rightarrow\) Sign test, Wilcoxon test
  • Two-sample t-test \(\rightarrow\) Mann-Whitney test
  • ANOVA \(\rightarrow\) Kruskal-Wallis

Small sample size, non-normality, unequal variances

Dramatically lower power compared to a parametric test

Randomization as an alternative

For all practical cases, randomization is a better alternative

  • Increased power
  • No reliance on asymptotic properties of tests
  • More relaxed assumptions

Common Use Cases

  • Hypothesis tests (most common)
    • Any scenario with a defined hypothesis and null hypothesis
  • Interval estimation (less common)